Skip to content

feat(plugins): import data from CSV and TSV files into a table (#1568)#1578

Merged
datlechin merged 5 commits into
mainfrom
feat/1568-csv-import
Jun 5, 2026
Merged

feat(plugins): import data from CSV and TSV files into a table (#1568)#1578
datlechin merged 5 commits into
mainfrom
feat/1568-csv-import

Conversation

@datlechin
Copy link
Copy Markdown
Member

Closes #1568.

Adds CSV and TSV import into a database table for any SQL target.

What it does

Pick File > Import > From CSV and choose a .csv or .tsv file. The row import sheet opens with CSV parsing options. Map columns to an existing table, or create a new table with inferred, editable types.

Options

  • Delimiter: auto-detect, comma, semicolon, tab, pipe
  • Quote character: double or single
  • Encoding: auto-detect, UTF-8, ISO Latin 1, Windows-1252
  • First row is a header
  • Trim leading and trailing spaces
  • Treat empty values as NULL, plus an optional NULL token
  • On error, wrap in transaction, delete existing rows (shared with SQL and JSON import)

Changing any dialect option re-reads the file so the column mapping reflects it.

How it works

  • Reuses the existing CSV parser. CSVStreamingParser, CSVDialect, and CSVTypeInferrer move from CSVInspectorPlugin into TableProPluginKit as public types, so the importer and the inspector share one RFC 4180 tokenizer (quoted commas and newlines, doubled quotes, BOM, delimiter and encoding detection). Additive PluginKit ABI, no version bump.
  • New CSVImportPlugin bundle memory-maps the file, indexes rows, and inserts in 500-row parameterized batches. Memory is bounded by the row-range index, not the row data. Cancellable per batch, wrapped in a transaction.
  • JSONImportSheet becomes RowImportSheet, shared by JSON and CSV. The format plugin supplies the icon, name, and options view. JSONImportTypeMapper becomes ImportTypeMapper. A new fieldDetectionSignature hook (additive, defaults to empty) drives live re-detection when an option changes.

Tests

  • New CSVImportPluginTests: dialect resolution, header and header-less naming with dedup, NULL/empty/trim handling, ragged rows, type mapping, quoted/embedded/doubled-quote parsing, semicolon auto-detect.
  • The moved CSV parser suites and the JSON import suites still pass after the move. 116 cases green.

Notes

  • Inserts are parameterized, so CSV values are never concatenated into SQL.
  • Docs updated in docs/features/import-export.mdx; CHANGELOG entry under Unreleased.

@mintlify
Copy link
Copy Markdown

mintlify Bot commented Jun 4, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
TablePro 🟢 Ready View Preview Jun 4, 2026, 5:05 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd014d0c1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +75 to +79
if settings.deleteExistingRows {
try await sink.deleteAllRowsFromTargetTable()
}
if useTransaction {
try await sink.beginTransaction()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Start the transaction before deleting target rows

When a CSV import is run with both Delete existing rows and the default transactional rollback mode, the target table is cleared before beginTransaction() is called, so a later parse/insert error rolls back only the inserts and leaves the pre-existing data deleted. This makes the rollback option unsafe for the exact scenario where users are replacing a table from a CSV; put the delete inside the transaction when useTransaction is true.

Useful? React with 👍 / 👎.

progress.incrementStatement(by: batch.count)
} catch {
switch settings.errorHandling {
case .stopAndRollback, .stopAndCommit:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor Stop and Commit for transactional CSV imports

If the user selects Stop and Commit with Wrap in transaction enabled, the first failed batch takes this combined case and throws to the outer handler, which always rolls back while useTransaction is true. That makes Stop and Commit behave like Stop and Rollback for CSV imports instead of preserving the successfully inserted batches.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 856b29dfb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +37 to +44
var used = Set<String>()
for index in 0..<columnCount {
let raw = header.flatMap { index < $0.count ? $0[index] : nil } ?? ""
let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines)
let base = trimmed.isEmpty ? defaultColumnName(index) : trimmed
var unique = base
var suffix = 2
while !used.insert(unique).inserted {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate CSV headers case-insensitively

When a CSV header contains names that differ only by case (for example id,ID) and the user imports into an existing table, this case-sensitive used set leaves both source fields distinct in the mapping UI, but ImportDataSinkAdapter later lowercases source keys when building columnMapping. One mapping then overwrites the other, and both row fields can be routed to the same target column or one source column can be silently lost; make the generated CSV field names unique under the same case-folding used by the sink.

Useful? React with 👍 / 👎.

Comment on lines +17 to +27

static func resolveDialect(in data: Data, options: CSVImportOptions) -> CSVDialect {
var dialect = CSVDialect.detect(from: data)
if let byte = options.delimiter.byte {
dialect.delimiter = byte
}
dialect.quoteChar = options.quoteCharacter.byte
if let forced = options.encoding.stringEncoding {
dialect.encoding = forced
}
return dialect
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid returning UTF-16 dialects to the byte parser

For a BOM-marked UTF-16 CSV, CSVDialect.detect(from:) sets .utf16LittleEndian/.utf16BigEndian, but CSVStreamingParser still scans delimiters and newlines as single bytes and advances only one byte past them. A UTF-16LE file such as a,b\n1,2 therefore leaves the delimiter's trailing NUL at the start of the next field, producing corrupted headers/values during detection and import; either transcode/reject UTF-16 before parsing or keep auto-detection to encodings the byte parser can tokenize correctly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 006fb24118

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +128 to +130
let value = column < fields.count ? fields[column] : ""
guard !value.isEmpty else { continue }
samples[column].append(value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply CSV null and trim options before inferring types

When the user sets CSV options such as NULL text or Trim leading and trailing spaces, imports apply those conversions in cellValue, but field detection still feeds the raw token into samples. For a new-table import with values like 1,\N,2 after configuring NULL text = \N (or numeric values padded with spaces while trim is enabled), the inferred type becomes text even though the rows will be inserted as integers/nulls, so the generated table schema is wrong.

Useful? React with 👍 / 👎.

Comment on lines +100 to +101
.onChange(of: currentPlugin?.fieldDetectionSignature) { _, _ in
Task { await redetectFields() }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh new-table fields after inactive CSV option changes

This only reloads the currently selected destination when CSV detection options change, but .task has already populated newColumns and loadNewColumns() will later no-op while newColumnsLoaded is true. If a user changes a field-shaping option while on Existing table (for example disables First row is a header) and then switches to New table, the stale mapping from the old settings is used while the import runs with the new settings; in that header toggle case the new table is created with old header names but the imported rows are keyed as Column 1, Column 2, so the sink skips the values and the import can appear successful with no data inserted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5dfafaf9fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +191 to +193
case .skipAndContinue:
for entry in batch {
try await insert(entry.row, into: sink, at: entry.line, progress: progress,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry only rows not already inserted

When Skip and Continue is used for a CSV import, retrying every row in the batch after insertRows throws can duplicate rows that were already committed by earlier chunks. ImportDataSinkAdapter.insertRows splits a batch by column shape and by maxBindParameters, so on MSSQL with more than 4 mapped columns (or any case where a later group/chunk fails) earlier chunk statements may have succeeded before the error reaches this catch; this loop then re-inserts those successful rows one-by-one instead of only isolating the failing chunk/rows.

Useful? React with 👍 / 👎.

…ixes rollback, duplicates, and stop-and-commit
@datlechin
Copy link
Copy Markdown
Member Author

Pushed 8578ce1 addressing the review findings:

Shared import loop extracted to PluginKit. New RowImportRunner owns the transaction lifecycle, batching, error modes, and progress for both JSON and CSV import (additive ABI, no version bump). The duplicated performImport/flush/insert bodies in both plugins are gone.

Behavior fixes (apply to JSON import too, CHANGELOG entries added):

  • "Delete existing rows" now runs inside the transaction, so a failed import restores the deleted rows.
  • Skip-and-continue inserts row by row from the start instead of retrying a partially written batch, which could duplicate rows (insertRows is multi-statement when chunked by bind-parameter limits).
  • "Stop and Commit" commits the rows inserted before the error instead of rolling them back. Cancellation still rolls back in every mode.

Review finding 3: field detection now applies the trim and NULL-token options to samples, so inferred types and sample values match what the import inserts. The detection signature keys are all meaningful now.

Review finding 2: decode keeps the lossless Latin-1 fallback but drops the unreachable ?? "" for an explicit total byte-to-scalar mapping.

Also: blank CSV rows now advance the progress bar, and the unused plugin loggers are removed.

Tests: 7 new RowImportRunnerTests with a mock sink cover transaction ordering, rollback/commit per mode, no-batch-retry in skip mode, the error cap, and cancellation. 2 new detection tests cover trim and NULL-token sampling. CSVImportPluginTests, JSONImportPluginTests, and CSVInspectorTests all pass (102 cases across the affected suites). swiftlint lint --strict clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8578ce16b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +118 to +120
let columnCount = header?.count
?? dataRanges.first.map { parser.parseRow(buffer, range: $0).count }
?? 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include data columns beyond a short CSV header

When First row is a header is enabled and the header has fewer fields than subsequent rows, this fixes the source schema to header.count; later row(fields:columnNames:) only emits those columnNames, so a file like a,b\n1,2,3 silently drops the third value during both mapping and import. columnNames(header:columnCount:) already supports generating placeholder names for missing header cells, so derive the count from the header and sampled data rows instead of the header alone.

Useful? React with 👍 / 👎.

@datlechin datlechin merged commit 716b23f into main Jun 5, 2026
5 of 6 checks passed
@datlechin datlechin deleted the feat/1568-csv-import branch June 5, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support import from csv

1 participant